black-box algorithm
Stabilizing black-box algorithms through task-oriented randomization
Abstract--As black-box models become foundational to mod-solution that can be applied across a wide range of scientific ern research, ensuring their stability is paramount for the realiza-and industrial domains. The inherent diversity of inputs--ranging from structured Gaussian distributions to Notwithstanding its widespread application, the framework complex data with unknown structures--poses a significantexhibits certain shortcomings when dealing with complex challenge: how to stabilize black-box outputs while effectivelydatasets. First, standard resampling schemes often fail to leveraging available prior information. This paper introduces aaccount for the underlying data structures; as a result, the task-oriented randomization methodology that adaptively tailorsdrawn samples cannot reflect the true data distribution, thereby its strategy to the underlying generative mechanisms of the input data, specifically addressing unstructured complexities. Second, effective sampling requires prior comprehensive suite of stability guarantees is proposed. Beyondknowledge of the distribution, which is often unattainable establishing rigorous theoretical foundations for stability, thein practical environments.
Best-of-N Jailbreaking
We introduce Best-of-N (BoN) Jailbreaking, a simple black-box algorithm that jailbreaks frontier AI systems across modalities. BoN Jailbreaking works by repeatedly sampling variations of a prompt with a combination of augmentations---such as random shuffling or capitalization for textual prompts---until a harmful response is elicited. We find that BoN Jailbreaking achieves high attack success rates (ASRs) on closed-source language models, such as 89% on GPT-4o and 78% on Claude 3.5 Sonnet when sampling 10,000 augmented prompts. Further, it is similarly effective at circumventing state-of-the-art open-source defenses like circuit breakers and reasoning models like o1. BoN also seamlessly extends to other modalities: it jailbreaks vision language models (VLMs) such as GPT-4o and audio language models (ALMs) like Gemini 1.5 Pro, using modality-specific augmentations. BoN reliably improves when we sample more augmented prompts. Across all modalities, ASR, as a function of the number of samples (N), empirically follows power-law-like behavior for many orders of magnitude. BoN Jailbreaking can also be composed with other black-box algorithms for even more effective attacks---combining BoN with an optimized prefix attack achieves up to a 35% increase in ASR. Overall, our work indicates that, despite their capability, language models are sensitive to seemingly innocuous changes to inputs, which attackers can exploit across modalities.
The Dangers Of Ai
We've all seen movies where AI takes over the world (I, Robot is probably my favorite) but what are the potential harms of it in the current day. Let's try and understand from where can these dangers arise in the first place. Modern AI uses various black-box algorithms where they get the desired results but the reasoning for it performing better or equivalent to humans might be lost in the process or rarely ever evaluated. Now you might be wondering if we control the results, how is it going to take over the world, the answer is it probably won't. What can go wrong though, is its ability to obtain results wanted by companies or organizations by crossing moral or legal boundaries without anybody knowing or realizing not even the companies themselves.
Explainable artificial intelligence: Easier said than done - STAT
The growing use of artificial intelligence in medicine is paralleled by growing concern among many policymakers, patients, and physicians about the use of black-box algorithms. In a nutshell, it's this: We don't know what these algorithms are doing or how they are doing it, and since we aren't in a position to understand them, they can't be trusted and shouldn't be relied upon. A new field of research, dubbed explainable artificial intelligence (XAI), aims to address these concerns. As we argue in Science magazine, together with our colleagues I. Glenn Cohen and Theodoros Evgeniou, this approach may not help and, in some instances, can hurt. Artificial intelligence (AI) systems, especially machine learning (ML) algorithms, are increasingly pervasive in health care.
Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals
There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opaqueness of AI-based decision-making systems, allowing humans to scrutinize and trust them. Prior work in this context has focused on the attribution of responsibility for an algorithm's decisions to its inputs wherein responsibility is typically approached as a purely associational concept. In this paper, we propose a principled causality-based approach for explaining black-box decision-making systems that addresses limitations of existing methods in XAI. At the core of our framework lies probabilistic contrastive counterfactuals, a concept that can be traced back to philosophical, cognitive, and social foundations of theories on how humans generate and select explanations. We show how such counterfactuals can quantify the direct and indirect influences of a variable on decisions made by an algorithm, and provide actionable recourse for individuals negatively affected by the algorithm's decision.
Australian Authorities Want an AI To Settle Your Divorce
For better or worse, there's a good chance your current love life owes something to automation. Even if you're just hooking up with the occasional Tinder fling (which if you are, no judgment), you're still turning to Tinder's black-box algorithms to pick out that fling for you before turning to more black-box algorithms to pick out the best dingy bar to meet them at before turning to more black-box algorithms to figure out what, exactly, should be your date night lewk. If things get serious further down the line, you might turn to another black-box algorithm to plan your entire damn wedding for you. And if it turns out you got married for all the wrong reasons, it turns out there's another set of black boxes you can plug your details into to settle the details of your divorce. Known as "amica," the service was rolled out yesterday by the Australian government as a way to let soon-to-be-exes "make parenting arrangements" and "divide their money and property" without having to go through the hassle of hiring a lawyer to do the heavy lifting.
Making AI Human Again: The importance of Explainable AI (XAI)
As the explosion of algorithms and Artificial Intelligence (AI) continues across business and society, we are already facing ethical, regulatory and business-critical issues around how we use the output from machine learning. The issue of who evaluates the decisions made by AI -- if anybody -- is becoming more urgent. "Programs are not products, they are processes… we will never be sure what a process does until we run it -- as occurred recently when Amazon's facial recognition software misidentified 28 members of Congress as criminal suspects." Of course, the history of technology is the story of augmenting human limitations with machinery or tools that enable us to do more than our bodies or minds let us. But are we on the verge of losing control of this vital process?
Using Machine Learning to Guide Cognitive Modeling: A Case Study in Moral Reasoning
Agrawal, Mayank, Peterson, Joshua C., Griffiths, Thomas L.
Large-scale behavioral datasets enable researchers to use complex machine learning algorithms to better predict human behavior, yet this increased predictive power does not always lead to a better understanding of the behavior in question. In this paper, we outline a data-driven, iterative procedure that allows cognitive scientists to use machine learning to generate models that are both interpretable and accurate. We demonstrate this method in the domain of moral decision-making, where standard experimental approaches often identify relevant principles that influence human judgments, but fail to generalize these findings to "real world" situations that place these principles in conflict. The recently released Moral Machine dataset allows us to build a powerful model that can predict the outcomes of these conflicts while remaining simple enough to explain the basis behind human decisions.